projpredSEM

Projection predictive variable selection for Bayesian regularized SEM

Sara van Erp

Utrecht University

Goal of projpredSEM


Context: Regularized SEM, i.e., models with many parameters and a penalty function (frequentist) or shrinkage prior (Bayesian).


Goal: Providing a more formal approach to select parameters (and thus models) in Bayesian regularized SEM.

Regularization in MIMIC models

MIMIC model drawn with https://semdiag.psychstat.org

Bayesian regularized SEM

A shrinkage prior takes the role of the penalty:

\[ posterior \propto likelihood \times prior \]

Ideal shrinkage prior:

  1. Peaked around zero
  2. Heavy tails

Many different shrinkage priors exist (see e.g., Van Erp, Oberski, and Mulder (2019)).

Advantages Bayesian regularized SEM


  • Intuitive interpretation
  • Automatic uncertainty estimates
  • Incorporation of prior information
  • Flexibility in the choice of shrinkage prior
  • Automatic estimation penalty parameter

Disadvantages Bayesian regularized SEM


  • Computationally expensive
  • Limited availability user-friendly software (see Van Erp (2023))
  • Parameters are not automatically set to zero

Why is projpredSEM needed?


In Bayesian regularized SEM, parameters are not automatically set to zero.

  • Van Erp, Oberski, and Mulder (2019) showed different conditions require different CIs
  • Zhang, Pan, and Ip (2021) showed different conditions require different (arbitrary) types of criteria
  • Selection criteria based on marginal posteriors might perform differently than joint criteria

Projection predictive variable selection: General approach

Goal: Finding a smaller submodel that predicts practically as good as the larger reference model.

  1. Specify a reference model
  2. Project the posterior information of the reference model onto the candidate models
  3. Select the candidate model with the best predictive performance

See e.g., Piironen and Vehtari (2017), Pavone et al. (2020), Piironen, Paasiniemi, and Vehtari (2020), or McLatchie et al. (2023)

Projection predictive variable selection for the MIMIC model

library(lavaan)
library(brms)
library(projpred)

mod <- 'F =~ y1 + y2 + y3 + y4 + y5
        F ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10'

fit.lavaan <- sem(mod, data = df)
fs <- lavPredict(fit.lavaan, method = "Bartlett")
df$fs <- as.vector(fs)

refm_fit <- brm(fs ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10,
                data = df,
                prior = prior_hs)
refm_obj <- get_refmodel(refm_fit)

cvvs <- cv_varsel(
  refm_obj,
  cv_method = "kfold",
  K = 10
)
plot(cvvs)

Example results: 10 covariates, with only x5 and x7 being relevant

Some preliminary simulation results

False positive rate when \(p = 20\) and \(N = 100\)

Power when \(p = 20\) and \(N = 100\)

False positive rate when \(p = 70\) and \(N = 100\)

Power when \(p = 70\) and \(N = 100\)

Discussion


When would we expect projpredSEM to be beneficial?

  • High \(\frac{p}{N}\) ratio
  • High multicollinearity


Note: projpredSEM is much slower than traditional criteria.

Future directions


  1. Extensive high-dimensional simulation
    • Based on real world data (e.g., genetic or neuroimaging)
    • Evaluating: false positive rate, power, and predictive performance
  2. Extension to other SEMs
    • Other models that might benefit from this approach?
    • Might require a novel implementation of the algorithm

Questions/ideas?


Feel free to reach out during this conference, or via e-mail: s.j.vanerp@uu.nl.

References

McLatchie, Yann, Sölvi Rögnvaldsson, Frank Weber, and Aki Vehtari. 2023. “Robust and Efficient Projection Predictive Inference.” arXiv. http://arxiv.org/abs/2306.15581.
Pavone, Federico, Juho Piironen, Paul-Christian Bürkner, and Aki Vehtari. 2020. “Using Reference Models in Variable Selection.” arXiv. http://arxiv.org/abs/2004.13118.
Piironen, Juho, Markus Paasiniemi, and Aki Vehtari. 2020. “Projective Inference in High-Dimensional Problems: Prediction and Feature Selection.” Electronic Journal of Statistics 14 (1). https://doi.org/10.1214/20-EJS1711.
Piironen, Juho, and Aki Vehtari. 2017. “Comparison of Bayesian Predictive Methods for Model Selection.” Statistics and Computing 27 (3): 711–35. https://doi.org/10.1007/s11222-016-9649-y.
Van Erp, Sara. 2023. “Bayesian Regularized SEM: Current Capabilities and Constraints.” Psych 5 (3): 814–35. https://doi.org/10.3390/psych5030054.
Van Erp, Sara, Daniel L. Oberski, and Joris Mulder. 2019. “Shrinkage Priors for Bayesian Penalized Regression.” Journal of Mathematical Psychology 89 (April): 31–50. https://doi.org/10.1016/j.jmp.2018.12.004.
Zhang, Lijin, Junhao Pan, and Edward Haksing Ip. 2021. “Criteria for Parameter Identification in Bayesian Lasso Methods for Covariance Analysis: Comparing Rules for Thresholding, p -Value, and Credible Interval.” Structural Equation Modeling: A Multidisciplinary Journal 28 (6): 941–50. https://doi.org/10.1080/10705511.2021.1945456.